Skip to content

Conversation

@gareth-ellis
Copy link
Member

@gareth-ellis gareth-ellis commented Aug 2, 2024

Future direction:

  • need to avoid messy nested fields
  • service_time should be a sum of all retries
  • follow approach of other products, such as logstash

@gbanasiak
Copy link
Contributor

@gareth-ellis
Copy link
Member Author

@elasticmachine update branch

@fressi-elastic fressi-elastic self-requested a review November 19, 2025 14:51
@gareth-ellis gareth-ellis marked this pull request as ready for review January 16, 2026 10:34
@gareth-ellis
Copy link
Member Author

My approach so far seems to work - Please take a look

@gareth-ellis
Copy link
Member Author

@elasticmachine update branch

@gareth-ellis
Copy link
Member Author

The failures are due to me changing the parameters for detailed_stats - https://github.com/elastic/rally-tracks/blob/master/elastic/shared/runners/bulk.py#L36C25-L36C39

I guess I can either
a) implement the retry in a new operation - RetryingBulk?
b) avoid changing parameters and pass via params - though i would then need to avoid returning lines_to_retry -
c) Update elastic/logs, but that would still risk breaking other tracks that external users have written (if they follow the same approach as elastic/logs

thoughts?

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting coupling problem with super().detailed_stats(...) used in elastic/logs. Tracks version control is only for ES, not for Rally, so changing custom detailed_stats() say like below will fail when calling parent's method in old Rally.

    def detailed_stats(self, params, response, emit_lines_to_retry=False):
        stats = super().detailed_stats(params, response, emit_lines_to_retry) <--- HERE
        return {**stats, **params["param-source-stats"]}

It would be best if we minimized the surface between tracks code and rally code. Our docs only mention top-level runner call with es and params arguments. Due to this I would be happy to declare that what this particular runner is doing as not supported. I would even go as far as clarify this in documentation.

My vote would be to change elastic/logs custom runner to avoid calling detailed_stats() completely and backport this to all branches where modified Rally might potentially be used, so I'd say all 8.x up until now.

Something like this perhaps?

class RawBulkIndex(BulkIndex):
    async def __call__(self, es, params):
        meta_data = await super().__call__(es, params)
        if params.get("detailed-results", False):
            meta_data.update(params["param-source-stats"])
        return meta_data

The alternative would be to keep detailed_results() as is, and determine documents to retry separately but that means iterating through a response twice which would impact processing time.

I'm curious about @fressi-elastic thoughts on that one as well.

I'm adding further comments below.


I think we also planned on adding exponential back-off. Is this still a plan, or not in this PR?

break
self.logger.warning("Retrying %d documents that previously resulted in a 429.", len(lines_to_retry) / 2)
api_kwargs["body"] = lines_to_retry
bulk_size = len(lines_to_retry) / 2 # at this point the data always contains action metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point the data always contains action metadata

This is a slight off-topic:

I've spent some time digging once I saw this comment. I think bulk runner always receives action-and-metadata lines in the body param today (see here). If corpus does not include them they are generated in earlier processing stages. I don't quite understand this from the above code:

        if with_action_metadata:
            api_kwargs.pop("index", None)
            # only half of the lines are documents
            response = await es.bulk(params=bulk_params, **api_kwargs)
        else:
            response = await es.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs)

The only half of the lines are documents comment suggests the else clause is different, but it isn't. There's nothing in bulk() method of ES client that would magically add action-and-metadata lines. Also doc_type is ignored I think, it's a leftover from old ES versions.

I think we could simplify / remove this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - it seems at one point we calculated the number of documents here, but that was removed and the comment not removed. I'll remove the comment for now, I think i would need to test a little more whether we have code paths that use the with_action_metadata or not

Comment on lines +565 to +586
if detailed_results:
stats = {
"success-count": total_success,
"error-count": total_error,
"retry-count": retry_count,
"took": total_time,
"success": len(lines_to_retry) == 0,
"retried": retry_count > 0,
"bulk-request-size-bytes": sum_bulk_request_size_bytes,
"total-document-size-bytes": sum_total_document_size_bytes,
"ops": {}, # detailed per-op stats are not aggregated over retries
"shards_histogram": [], # detailed per-shard stats are not aggregated over retries
}
else:
stats = {
"success-count": total_success,
"error-count": total_error,
"retry-count": retry_count,
"took": total_time,
"success": len(lines_to_retry) == 0,
"retried": retry_count > 0,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes the details of stats at _call__() level which were previously hidden either in simple_stats() or detailed_stats(). Can we avoid this? We could have a method that iterates through response documents, and:

  • calls another method (passed as an argument) that increases stats counters for each document,
  • builds a retry list (optionally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants